# Cancer Registration (CANCER) Dataset
## 1. Summary
The information below is retrieved from the Health Data Gateway API developed by NHS England, with additional fields added by UK LLC (indicated by italics).

In [1]:
# define target dataset to document
schema = 'nhsd'
table = 'CANCER'
version = 'v0002'
# import functions from script helper
import sys
script_fp = "../../../../scripts/"
sys.path.insert(0, script_fp)
from data_doc_helper import DocHelper
# create instance
document = DocHelper(schema, table, version, script_fp)
# markdown/code hybrid cell module requirement
from IPython.display import display, Markdown

In [2]:
# get api data
dataset = document.get_api_data()
display(Markdown("**NHS England title of dataset:** "+dataset['datasetfields']['datautility']['title']))
display(Markdown("***Dataset name in UK LLC TRE:*** *nhsd.CANCER*"))  
display(Markdown("**Short abstract:** "+dataset['datasetfields']['abstract']))
display(Markdown("***Extended abstract:*** *Data are collected by the National Cancer Registration and Analysis Service (NCRAS), which is part of NHS England's [National Disease Registration Service (NDRS)](https://digital.nhs.uk/ndrs/). The Cancer Registration dataset is a subset of the Cancer Outcomes and Services Dataset (COSD), which is the national standard for collecting cancer data in the NHS. The Cancer Registration dataset includes all patients (adults and children) diagnosed with or receiving cancer treatment in or funded by the NHS in England since 1971. Data collected include demographic characteristics and information about diagnoses and treatments. Data are collected under section 251 of the NHS Act 2006. Patients may opt out of the Cancer Registry, but this is different from the [National data opt out](https://digital.nhs.uk/services/national-data-opt-out). NCRAS works closely with cancer charities to promote the value of population-based cancer registration; <1 in 10,000 cancer patients opt out of the registry. For further information see the [National Cancer Registration Dataset Data Resource Profile](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7124503/pdf/dyz076.pdf).*"))
display(Markdown("**Geographical coverage:** "+dataset['datasetfields']['geographicCoverage'][0]))
display(Markdown("**Temporal coverage:** "+dataset['datasetfields']['datasetStartDate']))
display(Markdown("***Data available in UK LLC TRE from:*** *01/01/1971 onwards*"))
display(Markdown("**Typical age range:** "+dataset['datasetfields']['ageBand']))
display(Markdown("**Collection situation:** "+dataset['datasetv2']['provenance']['origin']['collectionSituation'][0]))
display(Markdown("**Purpose:** "+dataset['datasetv2']['provenance']['origin']['purpose'][0]))
display(Markdown("**Source:** "+dataset['datasetv2']['provenance']['origin']['source'][0]))
display(Markdown("**Pathway:** "+dataset['datasetv2']['coverage']['pathway']))
display(Markdown("***Information collected:*** *Demographic characteristics and information about diagnoses and treatments.*"))  
display(Markdown("***Structure of dataset:*** *Each line represents one participant.*"))  
display(Markdown("***Update frequency in UK LLC TRE:*** *Quarterly*"))  
display(Markdown("***Dataset versions in UK LLC TRE:*** *TBC*"))
display(Markdown("***Data quality issues:*** *Data submitted to the Cancer Registry are checked using a combination of automated tools and manual review by cancer registration officers who have detailed knowledge of cancer biology, coding and terminology. Care should be taken when interpreting time series, because clinical and coding definitions of cancer have changed over time. For further details see the [National Cancer Registration Dataset Data Resource Profile](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7124503/pdf/dyz076.pdf).*"))  
display(Markdown("***Restrictions to data usage***: *Medical purposes only (medical research) as defined in the NHS Act 2006: [https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information](https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information)*"))  
display(Markdown("***Further information:*** *[https://digital.nhs.uk/ndrs/](https://digital.nhs.uk/ndrs/)*"))


**NHS England title of dataset:** Cancer Registration Data

***Dataset name in UK LLC TRE:*** *nhsd.CANCER*

**Short abstract:** PHE supply cancer registration data to NHS Digital. For linkage with other NHS Digital data to provide notifications on cancer status, be available to support research studies and to identify potential research participants for clinical trials.

***Extended abstract:*** *Data are collected by the National Cancer Registration and Analysis Service (NCRAS), which is part of NHS England's [National Disease Registration Service (NDRS)](https://digital.nhs.uk/ndrs/). The Cancer Registration dataset is a subset of the Cancer Outcomes and Services Dataset (COSD), which is the national standard for collecting cancer data in the NHS. The Cancer Registration dataset includes all patients (adults and children) diagnosed with or receiving cancer treatment in or funded by the NHS in England since 1971. Data collected include demographic characteristics and information about diagnoses and treatments. Data are collected under section 251 of the NHS Act 2006. Patients may opt out of the Cancer Registry, but this is different from the [National data opt out](https://digital.nhs.uk/services/national-data-opt-out). NCRAS works closely with cancer charities to promote the value of population-based cancer registration; <1 in 10,000 cancer patients opt out of the registry. For further information see the [National Cancer Registration Dataset Data Resource Profile](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7124503/pdf/dyz076.pdf).*

**Geographical coverage:** United Kingdom,England

**Temporal coverage:** 1971-01-01

***Data available in UK LLC TRE from:*** *01/01/1971 onwards*

**Typical age range:** 0-150

**Collection situation:** CLINIC

**Purpose:** DISEASE REGISTRY

**Source:** O

**Pathway:** Cancer

***Information collected:*** *Demographic characteristics and information about diagnoses and treatments.*

***Structure of dataset:*** *Each line represents one participant.*

***Update frequency in UK LLC TRE:*** *Quarterly*

***Dataset versions in UK LLC TRE:*** *TBC*

***Data quality issues:*** *Data submitted to the Cancer Registry are checked using a combination of automated tools and manual review by cancer registration officers who have detailed knowledge of cancer biology, coding and terminology. Care should be taken when interpreting time series, because clinical and coding definitions of cancer have changed over time. For further details see the [National Cancer Registration Dataset Data Resource Profile](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7124503/pdf/dyz076.pdf).*

***Restrictions to data usage***: *Medical purposes only (medical research) as defined in the NHS Act 2006: [https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information](https://www.legislation.gov.uk/ukpga/2006/41/part/13/crossheading/patient-information)*

***Further information:*** *[https://digital.nhs.uk/ndrs/](https://digital.nhs.uk/ndrs/)*

## 2. Metrics
The tables below summarise the CANCER dataset in the UK LLC TRE.

**Table 1** The number of participants from each LPS that are represented in the CANCER dataset in the UK LLC TRE  
(**Note**: numbers relate to the most recent extract of NHS England data) 

In [3]:
gb_cohort = document.get_cohort_count()
print(gb_cohort.to_markdown(index=False, tablefmt="fancy_grid"))

╒════════════════╤═════════╕
│ cohort         │   count │
╞════════════════╪═════════╡
│ ALSPAC         │     231 │
├────────────────┼─────────┤
│ BCS70          │     514 │
├────────────────┼─────────┤
│ BIB            │     740 │
├────────────────┼─────────┤
│ ELSA           │    1678 │
├────────────────┼─────────┤
│ EPICN          │    4942 │
├────────────────┼─────────┤
│ EXCEED         │    1491 │
├────────────────┼─────────┤
│ FENLAND        │    1461 │
├────────────────┼─────────┤
│ GLAD           │    2002 │
├────────────────┼─────────┤
│ MCS            │    1157 │
├────────────────┼─────────┤
│ NCDS58         │    1056 │
├────────────────┼─────────┤
│ NEXTSTEP       │     129 │
├────────────────┼─────────┤
│ NIHRBIO_COPING │    1793 │
├────────────────┼─────────┤
│ NSHD46         │    1059 │
├────────────────┼─────────┤
│ TEDS           │       0 │
├────────────────┼─────────┤
│ TRACKC19       │    1124 │
├────────────────┼─────────┤
│ TWINSUK        │    2669 │
├─────────────

## 3. Helpful syntax
Below we will include syntax that may be helpful to other researchers in the UK LLC TRE. For longer scripts, we will include a snippet of the code plus a link to Git where you can find the full script. 